Search CORE

70 research outputs found

nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

Author: Du Pan
Kibbe Warren A
Lin Simon M
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. Results We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers. Reviewers This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Comprehensive Infrastructure for Big Data in Cancer Research: Accelerating Cancer Research and Precision Medicine

Author: Anthony R. Kerlavage
Ishwar Chandramouliswaran
Izumi V. Hinkson
Izumi V. Hinkson
Juli D. Klemm
Tanja M. Davidsen
Warren A. Kibbe
Warren A. Kibbe
Publication venue: 'Frontiers Media SA'
Publication date: 01/09/2017
Field of study

Advancements in next-generation sequencing and other -omics technologies are accelerating the detailed molecular characterization of individual patient tumors, and driving the evolution of precision medicine. Cancer is no longer considered a single disease, but rather, a diverse array of diseases wherein each patient has a unique collection of germline variants and somatic mutations. Molecular profiling of patient-derived samples has led to a data explosion that could help us understand the contributions of environment and germline to risk, therapeutic response, and outcome. To maximize the value of these data, an interdisciplinary approach is paramount. The National Cancer Institute (NCI) has initiated multiple projects to characterize tumor samples using multi-omic approaches. These projects harness the expertise of clinicians, biologists, computer scientists, and software engineers to investigate cancer biology and therapeutic response in multidisciplinary teams. Petabytes of cancer genomic, transcriptomic, epigenomic, proteomic, and imaging data have been generated by these projects. To address the data analysis challenges associated with these large datasets, the NCI has sponsored the development of the Genomic Data Commons (GDC) and three Cloud Resources. The GDC ensures data and metadata quality, ingests and harmonizes genomic data, and securely redistributes the data. During its pilot phase, the Cloud Resources tested multiple cloud-based approaches for enhancing data access, collaboration, computational scalability, resource democratization, and reproducibility. These NCI-led efforts are continuously being refined to better support open data practices and precision oncology, and to serve as building blocks of the NCI Cancer Research Data Commons

Directory of Open Access Journals

A collection of bioconductor methods to visualize gene-list annotations

Author: Du Pan
Feng Gang
Kibbe Warren A
Krett Nancy L
Lin Simon M
Rosen Steven
Tessel Michael
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. Findings We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. Conclusions These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

dictyBase, the model organism database for Dictyostelium discoideum

Author: Chisholm Rex L.
Fey Petra
Gaudet Pascale
Just Eric M.
Kibbe Warren A.
Merchant Sohel N.
Pilcher Karen E.
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

dictyBase () is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes

CiteSeerX

Crossref

PubMed Central

Xanthusbase: adapting wikipedia principles to a model organism database

Author: Arshinoff Bradley I.
Chisholm Rex L.
Just Eric M.
Kibbe Warren A.
Merchant Sohel M.
Suen Garret
Welch Roy D.
Publication venue: Oxford University Press
Publication date: 07/11/2006
Field of study

xanthusBase () is the official model organism database (MOD) for the social bacterium Myxococcus xanthus. In many respects, M.xanthus represents the pioneer model organism (MO) for studying the genetic, biochemical, and mechanistic basis of prokaryotic multicellularity, a topic that has garnered considerable attention due to the significance of biofilms in both basic and applied microbiology research. To facilitate its utility, the design of xanthusBase incorporates open-source software, leveraging the cumulative experience made available through the Generic Model Organism Database (GMOD) project, MediaWiki (), and dictyBase (), to create a MOD that is both highly useful and easily navigable. In addition, we have incorporated a unique Wikipedia-style curation model which exploits the internet's inherent interactivity, thus enabling M.xanthus and other myxobacterial researchers to contribute directly toward the ongoing genome annotation

CiteSeerX

Crossref

PubMed Central

Syracuse University Research Facility and Collaborative Environment

Annotating the human genome with Disease Ontology

Author: Chisholm Rex L
Danila Maria I
Feng Gang
Flatow Jared
Holko Michelle
Kibbe Warren A
Lin Simon M
Osborne John D
Zhu Lihua (Julie)
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship@UMMS

dictyBase—a Dictyostelium bioinformatics resource update

Author: Basu Siddhartha
Bushmanova Yulia A.
Chisholm Rex L.
Curk Tomaz
Fey Petra
Gaudet Pascale
Just Eric M.
Kibbe Warren A.
Merchant Sohel N.
Shaulsky Gad
Zupan Blaz
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations

CiteSeerX

PubMed Central

ePrints.FRI

Recommended from our members

Cancer Informatics for Cancer Centers (CI4CC): Building a Community Focused on Sharing Ideas and Best Practices to Improve Cancer Care and Patient Outcomes.

Author: Barnholtz-Sloan Jill S
Basu Amrita
Borowsky Alexander D
Bui Alex
DiGiovanna Jack
Garcia-Closas Montserrat
Genkinger Jeanine M
Gerke Travis
Induni Marta
Kibbe Warren A
Lacey James V, Jr
Mirel Lisa
Nadaf Sorena
Permuth Jennifer B
Rollison Dana E
Saltz Joel
Shenkman Elizabeth A
Ulrich Cornelia M
Zheng W Jim
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. Although each of the participating cancer centers is structured differently, and leaders' titles vary, we know firsthand there are similarities in both the issues we face and the solutions we achieve. As a consortium, we have initiated a dedicated listserv, an open-initiatives program, and targeted biannual face-to-face meetings. These meetings are a place to review our priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues we, as informatics leaders, individually face at our respective institutions and cancer centers. Here we provide a brief history of the CI4CC organization and meeting highlights from the latest CI4CC meeting that took place in Napa, California from October 14-16, 2019. The focus of this meeting was "intersections between informatics, data science, and population science." We conclude with a discussion on "hot topics" on the horizon for cancer informatics

eScholarship - University of California

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis

Author: BP Durbin
C O'Riain
C Thirlwell
CG Bell
Chiang-Ching Huang
CV Breton
D Grafodatskaya
DJ Weisenberger
EA Houseman
Illumina
Illumina
JG Herman
L Guo
L Shen
L Shi
L Shi
Lifang Hou
M Barnes
M Bibikova
M Bibikova
M Esteller
Nadereh Jafari
P Du
Pan Du
PW Laird
RA Irizarry
S Davis
Simon M Lin
SM Lin
Warren A Kibbe
Xiao Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Mining the Gene Wiki for functional genomic knowledge

Author: A Subramanian
AI Su
Andrew I Su
AR Aronson
AR Pico
B Mons
Benjamin M Good
C Jonquet
D Weekes
Douglas G Howe
DW Huang
E Callaway
E Camon
EB Camon
ES Lander
H Stehr
I Rivals
J Osborne
JC Venter
JW Huss
JW Huss
L Hirschman
LA Flórez
M Ashburner
M Waldrop
N Daraselia
NH Shah
R Hoffmann
R Tirrell
R Winnenburg
Simon M Lin
W Baumgartner
Warren A Kibbe
Z Lu
Publication venue: BioMed Central
Publication date: 01/12/2011
Field of study

Abstract Background Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. Results Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. Conclusions The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central